Deep Learning - The Mathematics Behind Neural Network - Part 2
Note: This blog is part of a learn-along series, so there may be updates and changes as we progress.
In the previous blog, we covered the foundational concepts of neural networks. In this post, we learn the mathematics behind a basic neural network structure as illustrated below:
Introduction to Neural Network Structure
Input Nodes:n1 and n2
Hidden Nodes:n3 to n8
Output Node:n9
Each node is associated with a bias, denoted as bi, and each synapse (connection between nodes) has a weight, denoted as wi. The initial input values are x1 and x2, and y^ represents the target output value.
Initializing Inputs, Weights, and Biases
First, let's give the inputs, weights and biases an initial value to better visualize what is happening.
Now, let's understand how the input values works through the neural network. Looking at the node n3 which is in the hidden layer, we can see that it receives two inputs through the two synapses and as we discussed in the previous blog, the node usually does two things with the input. It finds its total net input and applies an activation function to the total net input to get the output of node n3. Following some common practices we will use ReLu which is an activation function for the node(s) of the hidden layers and Sigmoid for the node(s) of the output layer.
netn3=w1⋅x1+w4⋅x2+b1
netn3=0.1⋅0.23+0.4⋅0.55+0.1=0.343
outn3=max(0,netn3)
outn3=max(0,0.343)=0.343
Here is the output for the rest of the nodes in the hidden layer 1:
outn4=max(0,−0.079)=0
outn5=max(0,0.699)=0.699
Now, the output of the nodes in the hidden layer 1 becomes the input of the nodes in the hidden layer 2 as shown in the diagram below.
And after repeating the same steps to find the output of the nodes we get the following outputs.
outn6=max(0,0.0197)=0.0197
outn7=max(0,1.1239)=1.1239
outn8=max(0,1.3281)=1.3281
Computing Output Layer Activation
As mentioned above, we will be using the Sigmoid function as the activation function for the nodes in the output layer, which in this case is only one node.
Our next step is to calculate the total error of the neural network. This can be done using a variety of methods but we will be making the use of the Squared Error with a multiplier of 21 so that the derivative we will be doing later on will be much cleaner.
E(y,y^)=21∑(y−y^)2
y^ represents the ideal output and y represents the actual output. And let's assume that y=0.01 to continue with our explanation.
E(y,y^)=21(y−y^)2
E(0.01,0.8035)=21(0.01−0.8035)2=0.315
If we had more than one output, we would have to calculate the error for each output and sum them to get the total error. But since we have only one output we can say that Etotal=0.315
Backpropagation and Weight Updates
Next, we have to do the backwards pass also know as backpropagation which updates the weights and biases. This is done to bring the ideal output closer to the actual output, which also reduces the total error in the process. Let's first try to update the weight w16. Before we update it, we must know how much a change in the weight w16 affects the total error Etotal.
∂w16∂Etotal
If we apply the chain rule to ∂w16∂Etotal we get:
To update the weights and biases in the hidden layers, we need to propagate the error backward from the output layer to the hidden layers. We'll start by calculating the partial derivatives for the weights of the synapses going in the second hidden layer, and then move to the weights of the synapses going in the first hidden layer.
By iterating this process (training), the total error decreases, and the neural network improves its task performance.
In this post, we've covered the mathematics behind a basic neural network, focusing on how the inputs, weights, and biases interact to produce the final output. We've walked through the process of forward propagation, calculating the output of each node, and applied the backpropagation algorithm to update the weights and biases, reducing the total error.